Mon. Not. R. Astron. Soc. 000, 000-000 (0000) 



Printed 2 February 2008 



(MN MfeK style file vl.4) 



Joint modeling of the probability distribution and power 
spectrum of the Lya forest : comparison with observations 

at z = 3 

Vincent Desjacques 1 and Adi Nusser 1 ' 2 

1 The Physics Department and the Asher Space Research Institute, Technion, Haifa 32000, Israel 

2 The Institute for Advanced Study, School of Natural Sciences, Einstein Drive, Princeton, NJ 08540 
- - i Email : dvince@physics.technion.ac.il, adi@physics.technion.ac.il 

O : 
O ' 
(N ' 

■ 2 February 2008 

s : 

ABSTRACT 

ON , We present results of joint modeling of the probability distribution function (PDF) 

and the one-dimensional power spectrum (PS) of the Lya forest flux decrement. The 
sensitivity of these statistical measures to the shape and amplitude of the linear matter 
power spectrum is investigated using N-body simulations of two variants of the ACDM 
cosmology. In the first model, the linear power spectrum has a scale-invariant spectral 
', index n s = 1, whereas in the second, it has a negative running index (RSI), dn/dln/c < 

f*^) ■ 0. We generate mock catalogs of QSO spectra, and compare their statistical properties 

— , ' to those of the observations at z = 3. We perform a joint fit of the power spectrum 

t^J" , and the PDF. A scale-invariant model with og = 0.9 matches well the data if the mean 

' IGM temperature is T <L5 x 10 4 K. For higher temperature, it tends to overestimate 

, the flux power spectrum over scales k <0.01 skin -1 . The discrepancy is less severe 

when the PS alone is fitted. However, models matching the PS alone do not yield a 
JL good fit to the PDF. A joint analysis of the flux PS and PDF tightens the constraints 

^ ■ on the model parameters and reduces systematic biases. The RSI model is consistent 

' with the observed PDF and PS only if the temperature is T >2 x 10 4 K. The best 

C$ • fit models reproduce the slope and normalisation of the column density distribution, 

irrespective of the shape and amplitude of the linear power spectrum. They are also 
. t-H , consistent with the observed line- width distribution given the large uncertainties. Our 

■ joint analysis suggests that eg is likely to be in the range 0.7-0.9 for a temperature 
5— i ' 1 <T <2 x 10 4 K and a reasonable reionization history. 
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1 INTRODUCTION 

Absorption features in the Lya forest provide direct in- 
formation on the large scale distribution of neutral hydro- 
gen in the highly ionized intergalactic medium (Bahcall & 
Salpeter 1965; Gunn & Peterson 1965; see e.g. Rauch 1998 
for a review). The Lya forest is believed to originate in a 
warm photoionized, fluctuating intergalactic medium (IGM) 
which smoothly traces the distribution of the dark mat- 
ter. This picture is sustained by hydrodynamical simula- 
tions and semi-analytical models which have succeeded in 
reproducing the observed statistics of the absorption lines 
and transmitted flux (e.g. McGill 1990; Bi 1993; Cen et al. 
1994; Zhang, Anninos & Norman 1995; Petitjean, Miicket & 
Kates 1995; Hernquist et al. 1996; Katz et al. 1996; Miralda- 



Escude et al. 1996; Hui, Gnedin & Zhang 1997; Theuns et 
al. 1998; Schaye 2001). The Lya forest can therefore serve 
as a probe the physical state of the IGM, and the under- 
lying matter distribution over a wide range of scales and 
redshifts (e.g. Croft et al. 1998; Nusser & Haehnelt 2000; 
Schaye et al. 2000). Several method have been proposed for 
recovering the shape and amplitude of the primordial power 
spectrum, A*(fc) (e.g. Croft et al. 1998, 2002b; McDonald et 
al. 2000; McDonald 2003). The current methods rely either 
on an inversion of the one-dimensional flux power spectrum 
A|(fe) (e.g. Croft et al. 1999, 2002b; McDonald 2003), or 
on a forward comparison between the data and the simula- 
tions (e.g. Zaldarriaga, Hui & Tegmark 2001; McDonald et 
al. 2004b). The nonlinear relation between the matter and 
flux power spectra is calibrated by means of detailed nu- 
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merical simulations (e.g. Croft et al. 1998; McDonald 2003). 
These methods are fundamentally limited by continuum fit- 
ting errors, metal contamination and nonlinear corrections 
(e.g. Hui et al. 2001; Zaldarriaga, Scoccimarro & Hui 2003; 
see also Gnedin & Hamilton 2002). Furthermore, one has to 
marginalize over the 'nuisance' parameters in order to place 
constraints on the spectral index n s and the power spectrum 
normalisation amplitude as . Underestimating the mean flux 
level (F), for example, can strongly bias the results as it was 
shown by Seljak, McDonald & Makarov (2003), and Viel, 
Haehnelt & Springel (2004) . The most recent analyses of the 
Lya forest favour a simple scale-invariant model with ag in 
the range 0.85-0.95 (McDonald et al. 2004b; Viel, Haehnelt 
& Springel 2004; Viel, Weller & Haehnelt 2004; Seljak et al. 
2004). However, one should emphasize that these studies do 
not include in their analysis other statistics than the power 
spectrum of the Lya forest. A priori, a model which fits the 
observed power spectrum solely will not necessarily agree 
with the observed PDF, bispectrum etc. It is therefore pru- 
dent to examine how other statistical measures of the forest 
constrain the cosmological model. 

In this work, we follow a different strategy and assess 
the validity of a cosmological model using three statistical 
measures of the Lya forest. These statistics are the one- 
dimensional flux power spectrum, A F (fc), the flux proba- 
bility distribution function (PDF), P(F), and the neutral 
hydrogen column density distribution f(N m ). Furthermore, 
we adopt the "forward approach" which consists in com- 
paring the observed statistical measures with those directly 
derived from N-body simulations of the cosmological model 
under consideration (see e.g. Zaldarriaga, Hui & Tegmark 
2001). This approach has the advantage that it does not 
require any inversion procedure to obtain the linear mass 
power spectrum. 

Combining several statistical measures could signifi- 
cantly affect the constraints on the shape and amplitude of 
the primordial power spectrum as inferred from A F (fc) alone. 
In this study, we will attempt to match simultaneously the 
three statistics mentioned above. In principle, higher order 
correlations such as the bispectrum should also be consid- 
ered since they provide independent information on the cos- 
mology (e.g. Mandelbaum et al. 2003). However, the Lya 
bispectrum as computed from high-resolution QSO spectra 
is still too noisy to provide significant constraints on the 
cosmological model (e.g. Viel et al. 2004a). To explore a 
parameter space as large as possible, we will implement a 
simple semi-analytical model which reproduces many prop- 
erties of the Lya absorbers (e.g. Bi, Borner & Chu 1992; 
Reisenegger & Miralda-Escude 1995; Bi & Davidsen 1997; 
Gnedin & Hui 1998) . We will generate a wide sample of mock 
catalogs and constrain the model parameters by comparing 
their statistical properties to the observations of McDonald 
et al. (2000). 

The paper is organized as follows. In Section §2 we 
briefly review the observational data we aim at fitting in 
this analysis. In §3.2.1, we detail the method used to com- 
pute synthetic spectra from pure dark matter (DM) simula- 
tions. The comparison between simulation and observations 
is done in §4. We discuss our results in §5. 



Table 1. The main parameters of the ACDM simulations con- 
sidered in the present paper. The normalisation of the RSI sim- 
ulations follows from the results of Spcrgcl et al. (2003). The 
spectral index n s and its derivative dlnn s /dlnfc are given at 
k = 0.07 /iMpc" 1 . The mass of a dark matter particle is about 
8 X 10 7 Mq jh in all the simulations. 







n A 


h as n s 


dlnris/dlnfc 


SI and S2 


0.30 


0.70 


0.70 0.90 1 





Rl and R2 


0.31 


0.69 


0.71 0.84 0.93 


-0.03 



2 OBSERVATIONAL DATA 

In this paper, we will compare the simulated statistics to 
the flux power spectrum and PDF measured by McDon- 
ald et al. (2000) (hereafter MOO), which were obtained from 
a sample of eight high-resolution QSO spectra. Regard- 
ing the flux power spectrum, we consider the data points 
in the range 0.005 < k < 0.04 skm -1 . The lower limit 
k = 0.005 skm -1 is set by the size of our simulations (cf. 
Section §3.1). Note that it is larger than the characteristic 
wavenumber k ~ 0.003 skm -1 below which continuum fit- 
ting errors are expected to become important (e.g. Hui et al. 
2001). The upper limit k — 0.05 skm -1 is chosen because of 
concern about metal contamination on smaller scales (e.g. 
MOO; Kim et al. 2004). The observed flux PDF is very sensi- 
tive to continuum fitting, especially in the high transmissiv- 
ity tail. We therefore exclude the data points with F > 0.8 
from the analysis to avoid dealing with continuum fitting er- 
ror. In addition, we discard also the measurement at F — 
since it is strongly affected by noise (e.g. Rauch et al. 1997). 
Finally, we emphasize that we work with the power spec- 
trum of the transmitted flux F, Ap(fc), which differs from 
that of the flux density contrast, 5p = (F/(F) — 1), by a 
multiplicative factor {F} 2 . A F (fc) has the advantage that it 
does not depend explicitly on the mean flux level (F). 



3 MODELLING THE Lya FOREST 

In this Section, we will first describe the DM simulations 
in detail. We will then briefly review the basic ingredients 
relevant to create realistic Lya spectra from simulations of 
DM only. 

3.1 The N-body simulations 

We run DM simulations of a ACDM cosmology with the 
N-body code GADGET (Springel, Yoshida & White 2001). 
Pure DM simulations are more suitable to our purposes since 
they allow us to explore a larger parameter space than full 
hydrodynamical simulations. To reproduce the Lya forest as 
seen in QSO spectra and, at the same time, resolve typical 
absorption systems, a large volume together with a high res- 
olution are requested (e.g. Theuns et al. 1998; Bryan et al. 
1999). For this reason, each simulation evolves 256 3 particles 
of mass m ~ 10 8 Mq//i in a periodic box of size 25 /i _1 Mpc. 
Since cosmic variance error on, e.g., the optical depth nor- 
malisation (at constant mean flux) can be relatively large 
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Figure 1. Left panel : The particle distribution in a slice of thickness Ah = 0.05 /i — 1 Mpc through the various ACDM simulations at 
z = 3 : the scale-invariant realisations SI and S2 (top panels), and the RSI realisations Rl and R2 (bottom panels). Right panel : The 
corresponding 3D power spectra A^(fc) computed at z = 3 from the DM particle distribution. The dashed curves show A^(fc) for the 
S2 and R2 simulations. The dotted curves show the linear power spectrum used to seed the fluctuations. 



when estimated from such a small volume (e.g. Croft et al. 
2002b), we have run two random realisations of each of the 
model we consider in the present paper. In the simulations 
which we denote SI and S2, the initial power spectrum is 
scale-invariant, n s = 1, and was obtained from fitting for- 
mulae of Eisenstein & Hu (1999). In the simulations Rl and 
R2, the initial power spectrum is that of the running spec- 
tral index model (RSI, see e.g. Spergel 2003). The present- 
day clustering amplitude in these simulations, erg = 0.84, is 
slightly lower than that of SI and S2, for which as = 0.9. 
The initial particle positions were computed using a parallel 
code kindly provided by Volker Springel. The cosmological 
parameters are summarized in Table 1. In the left panel of 
Fig. 1, we show a slice of thickness Ah = 0.05 /i _1 Mpc ex- 
tracted from the four simulations at redshift z = 3. Note 
that the simulations SI and Rl (S2 and R2) have the same 
random seed. 

To quantify to which extent the various simulations dif- 
fer, we plot in Fig. 1 the dimensionless 3D matter power 
spectrum A^(fc) at z = 3 for all the simulations. It was ob- 
tained by sampling the DM particles onto a 512 3 grid using 
a cloud-in-cell method, and taking the Fourier transform by 
means of a FFT routine. As we can see, Rl and R2 exhibit 
much less power than SI and S2. On scale k ~ 3 /iMpc -1 , 
which corresponds to the pivot point in the flux power spec- 
trum (cf. Figure 4), A^(fc) in the scale-invariant simula- 
tions is about twice as large as is in the RSI simulations. 
Cosmic variance becomes important on scale k <cl /iMpc -1 , 
where the matter power spectrum computed from the SI 
(Rl) realisation is larger than that of S2 (R2) by about 10- 
20%. We also show the linear power spectra which we use 
to seed fluctuations in the initial DM distribution as dotted 
curves. The largest scale of the simulations, k <^1 /iMpc -1 , 
are still in the linear regime at z = 3. Note also that, on 



scale k <fl.l /iMpc 1 , the power spectrum of the RSI model 
is very close to that of the n s = 1 model. 

3.2 Generating mock catalogs of QSO spectra 

3.2.1 The low density IGM 

Semi-analytical models of the Lya forest exploit the tight 
relationship between the gas and the underlying DM distri- 
bution (e.g. Bi 1993; Bi & Davidsen 1997; Petitjean et al. 
1995; Gnedin & Hui 1998; Croft et al. 1998). In this ap- 
proach, the gas temperature T s , and the neutral hydrogen 
density n HI are computed using tight power-law relations 
obtained from full hydrodynamical simulations (e.g. Katz et 
al. 1996, Hui & Gnedin 1997, Theuns et al. 1998), 

T g = f g (1 + S^- 1 and n HI = n HI (1 + 6 s ) a , (1) 

where the adiabatic index 7 is in the range 1 — 1.62, and 
a = 2 — 0.7 (7 — 1). T g and n HI are the temperature and 
neutral hydrogen density at mean gas density respectively. 
These scaling relations hold for moderate gas overdensities, 
i.e. S g <;50 . For higher overdensities however, we have to 
account for line cooling. To this extent, we follow Desjacques 
et al. (2004) and take the gas temperature to be T g = 10 4 K 
for 5p/p > 50 This cutoff ensures that the Doppler param- 
eter 6(x) is always 6(x) = 13 kms -1 in the high density 
regions of the simulation. 

When radiative cooling is inefficient, it is reasonable to 
assume that the gas density and velocity fields, 8 g and v g , 
are smoothed versions of their DM counterparts, <5 m and 
v m - We obtain the gas density distribution by sampling the 

* When pressure smoothing is properly accounted for, this ap- 
proximation indeed breaks down at much lower overdensities 
(Schaye 2001) 
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Figure 2. Synthetic spectra extracted from each of the simulations at z = 3. The vertical bars above the spectra show the position of the 
lines with fitted column densities JV HI exceeding 10 12,5 cm 2 . The comoving length of each spectrum is L = 25 h~ 1 Mpc, and corresponds 
to a redshift interval Az ~ 0.04 at z = 3. 



DM particles onto a 512 grid, and smoothing the resulting 
density field with a Gaussian filter W = exp(— k 2 /2k 2 ?) of 
characteristic length l/fa?. This choice is motivated by the 
fact that a Gaussian filter gives a good fit to the gas fluctu- 
ations over a wide range of wavenumber (e.g. Gnedin et al. 
2003). Furthermore, we expect k-p to depend on the phys- 
ical state of the IGM. However, since the relation between 
&f and T g depends noticeably on the reionization history of 
the Universe (Gnedin & Hui 1998), it is more convenient to 
treat &f as a free parameter (see e.g. Zaldarriaga, Hui & 
Tegmark 2001). We discuss gas smoothing in more detail in 
Appendix §A. 



3.2.2 The synthetic spectra 

Once we have smoothed the DM density and velocity fields 
on the comoving scale l/fcp, the flux distribution depends 
only on the mean flux (F), the adiabatic index 7 and the 
mean IGM temperature T4. For any value taken by the four- 
dimensional parameter vector p=(kF,(F),~/,T4), we generate 
mock catalogs as follows. For each of the simulations, wc 
randomly select 10 4 lines of sight (LOS) of comoving length 
L — 25 /i -1 Mpc. The optical depth r and the transmitted 
flux F — cxp(— r) are then computed along each LOS in 
512 pixels of width At; ~ 6 kms -1 according to the Gunn- 
Peterson approximation (Gunn & Peterson 1965). The reso- 
lution of the synthetic spectra is somewhat larger than that 
of the MOO data, where it is At> = 2.5 kms -1 . In order to 
create realistic Keck spectra, we should have computed the 
flux on pixels smaller than 2.5 kms -1 , smooth the spectra 
with a Gaussian of resolution 6.6 kms -1 (FWHM), and then 
resample them on pixels of 2.5 kms -1 . Unfortunately, FFT 
of three-dimensional regular grids is still prohibitively slow 
for meshes larger than 512 3 . Notwithstanding, we believe 



that the flux power spectrum and PDF will not be much af- 
fected. However, we should emphasize that it might not be 
the case of the line statistics, in particular the line width dis- 
tribution. In the last step, the value of h m is adjusted such 
that the mean flux (F) of the whole sample matches the de- 
sired value (e.g. Rauch et al. 1997). To account for the noise 
in the observations, we add a uniform Gaussian deviate of 
dispersion a = 0.02 per interval of width An — 2.5 kms . 

In Fig. 2, we plot four synthetic spectra selected from 
each of the simulations. The spectra were extracted from 
mock samples which, for a given simulation, give an accept- 
able fit to the observed statistics of the Lya forest (cf. Sec- 
tion §4). In particular, the mean IGM temperature is respec- 
tively Ti = 1.5 and 2 for the scale- invariant (top panels) and 
RSI models (bottom panels). Ticks above the spectra mark 
the Lya absorption lines which were identified with the spec- 
tral fitting program VPFIT (Carswell et al. 2003). The lines 
are distinguished by their column density N m (in cm -2 ) 
and their width or Doppler parameter b (in kms -1 ). This 
fitting technique assumes that the fundamental shape of the 
lines is a Voigt profile, which is a good approximation for 
column densities N HI <C10 17 cm -2 relevant to the Lya forest. 
Absorption systems with column density iV HI < 10 12 ' 5 cm 2 
are quite uncertain and are not shown on the figure (e.g. 
Bi & Davidsen 1997). The spectra extracted from Rl and 
R2 feature somewhat more lines than those extracted from 
SI and S2. The reason is the larger clustering of the scale- 
invariant models. As a result, blending of absorbers to form 
one strong line occurs more often, and cause the number 
of lines identified with VPFIT to be lower than that in the 
Rl and R2 models. Note that, in practice, associated metal 
lines can help fixing the number of subcomponents. We will 
discuss the properties of the simulated column density dis- 
tribution in Section S4.4. 
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Figure 3. Constraints in the plane fcp - 7 for a scale-invariant cosmology with erg = 0.9 (top panels), and a RSI cosmology (bottom 
panels). The contours show the 1 and 2a confidence levels (assuming a Gaussian likelihood). The mean IGM temperature is T^ = 1 (left 
panels), 1.5 (middle panels) and 2 (right panels). The constraint arising from measurements of the flux PS solely is shown as dashed 
contours, while that arising from the combination of the flux PS and PDF is shown as solid contours. The stars and diamonds mark the 
best fit models obtained from the PS and PS+PDF data respectively. 



4 COMPARISON WITH OBSERVATIONS 

In this Section, we compare the statistical properties of our 
mock catalogs to that of the observations. We perform a \ 2 
statistics for the observed flux power spectrum and PDF. 
We also compare the simulated line column density and line- 
width distributions to observational data for several models. 



4.1 The parameter grid 

For each value of our parameter vector p=(kF,(F),~/,T4), we 
generate mock catalogs of ten thousand lines of sight. We 
let the parameters assume the following values, 

fc F = 4.55,5,5.56,6.25,7.14,8.33,10,12.5 

7 = 1,1.1,1.2,1.3,1.4,1.5,1.6 

f 4 = 0.5,1,1.5,2,2.5 

(F) = 0.60,0.61,0.62, •••,0.71,0.72,0.73 

The values of /cf (in unit of /iMpc -1 ) were chosen such 
that 1/rvF uniformly spans the range 0.08 - 0.22 /i _1 Mpc. 
Our discrete grid thus contains 8x7x5x14=3920 models. 
For each mock catalog, we then calculate the flux power 
spectrum and the flux PDF, and average over the SI and 
S2 (resp. Rl and R2) realisations. We then determine the 
goodness of fit of any model in the grid by computing a \ 2 
statistics from the difference between the simulated PS, PDF 
and the observational data. We account for measurements 
of the mean flux (F) by adding a term ((F) — F) 2 /ap to the 
chi-squared, where F and ap are the observed mean flux 
and error bar at z = 3. Note that the number of flux PS 
and PDF measurements used in the analysis is 10+15=25. 
Regarding the mean flux level, we will hereafter adopt the 



default value of MOO, F = 0.684 and ap = 0.023 (see Table 1 
of their paper). We should however keep in mind that the 
presence of metal lines or strong absorption systems can 
substantially affect the determination of (F) (e.g. Schaye 
et al. 2003; Viel et al. 2004b). We will discuss the effect of 
changing the mean flux level in Section §4.5. 

The problem of finding the best fit models is equivalent 
of finding the maximum of some hypersurface. Since the true 
maximum does not generally lies exactly at a grid point, it 
is desirable to interpolate over the different parameters. To 
this extent, we take advantage of the smooth dependence 
of the flux PS and PDF on the parameter vector, and use 
cubic spline interpolation. This method is known to work 
substantially better than a simple multi-linear interpolation 
(e.g. Tegmark & Zaldarriaga 2000). To marginalize over a 
subset of the parameters, we fix their values to that of the 
best fit model of the subspace, which we obtained from the 
spline interpolation. To check the accuracy of the interpo- 
lation, we computed the flux PS and PDF from the simu- 
lations for several randomly selected models which do not 
lie at a grid point. We found that the flux PS and PDF of 
the spline-interpolated models differ from that of the 'true' 
models by <J0.5% only in the range of scales considered in 
our analysis. 

The uniformity of space along the line of sight implies 
that the different A;— modes of the flux PS are uncorrelated. 
Uneven sampling of the forest along the line of sight or re- 
moval of some regions in the data (such as saturated lines) 
will introduce covariance between the modes. Yet the covari- 
ance matrix of the observed flux power spectrum is reason- 
ably close to diagonal over the scales relevant to our anal- 
ysis (Croft et al. 2002, Viel, Haehnelt & Springel 2004). 
We will thus neglect correlations between data points in 
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the flux PS. In the case of the PDF, however, the data 
points are strongly correlated. As a result, the \ 2 distri- 
bution as computed from the diagonal elements of the error 
matrix solely might differ noticeably from that computed 
from the full error matrix (see e.g. MOO; Gnedin & Hamilton 
2002). The x 2 should therefore be computed from the full 
covariance matrix (available at http://www.astronomy.ohio- 
state.edu/jordi/lya). This covariance matrix reflects the ran- 
dom errors in the flux measurement and, more importantly, 
the errors resulting from the finite number of QSO spectra. 
It was estimated using a Monte-Carlo procedure based on 
a variation of the bootstrap method (Press et al. 1992; see 
Appendix B of MOO for the details). Including the full co- 
variance matrix could have a strong effect on the results, 
especially when the errors are highly correlated as it is cer- 
tainly the case in the flux PDF. It is therefore important 
to assess the sensitivity of our analysis to the details of the 
covariance matrix. To this purpose, we will initially present 
results obtained by setting to zero the off-diagonal elements 
of the covariance matrix. In Section §4.3.3, we will repeat 
the analysis using the full covariance matrix of the flux PDF, 
and discuss to which extent the inclusion of the covariance 
matrix affects the inferred constraints. 

Since the spectral fitting program is relatively slow, we 
have not included the line distribution in the \ 2 statistics. 
We will compute the line statistics only for models which 
best fit the PS and PDF (cf. Section §4.4). 

4.2 The flux power spectrum and probability 
distribution 

4-2.1 Constraint in the kp-'y plane 

Fig. 3 shows the constraints in the plane fc F - 7 for the 
scale-invariant cosmology with as — 0.9 (top panels) and for 
the RSI cosmology (bottom panels). The mean IGM tem- 
perature is Ti = 1 (left panels), 1.5 (middle panels) and 
2 (right panels). The contours show the 68.3% and 95.5% 
confidence levels (assuming a Gaussian distribution for the 
likelihood) obtained from the flux power spectrum alone 
(dashed curves), and from a combination of the flux PS and 
PDF data (solid curves). There is obviously a degeneracy be- 
tween the filtering wavenumber fa? and the adiabatic index 
7. For illustration, we plot in Fig. 4 the PS and PDF of three 
scale- invariant models which fit the PS and PDF data at the 
la level. The IGM temperature is T^ = 1.5. The models are 
plotted together with the measurements of MOO, shown as 
empty symbols. As we can see, although these models differ 
in the value of &f and 7, they all yield very similar power 
spectra and PDFs. To understand the origin of this degener- 
acy, note that increasing 7 generally reduces the width of the 
weak lines (r <Jl) relative to that of the strong lines (r J>1). 
This follows from the fact that the low column density Lya 
forest arises in gas of low overdensity. Consequently, increas- 
ing 7 (at constant mean flux) amounts to decreasing Ap(fc) 
on small scale, and thereby mimics the effect of a smaller 
kp. In the PDF, increasing 7 enhances the fraction of trans- 
missivity pixels in the range F J>0.5 presumably because, at 
constant mean flux, the lower optical depth normalisation 
and the reduction in the width of the weak lines conspire 
to increase the fraction of high transmissivity pixels. As a 
result, the combination of the PS and PDF cannot break 



the degeneracy since both statistics exhibit a similar depen- 
dence on /cf and 7. In reality however, one would expect the 
filtering length to depend on T g and 7 (see Appendix §A) 
. Therefore, additional assumptions on the reionization his- 
tory of the Universe could break this degeneracy. In this 
respect, the observed line-width distribution suggests that, 
around z = 3, there is a sharp increase in T g together with 
a decrease in 7 (Schaye et al. 2000; Ricotti, Gnedin & Shull 
2000; see however McDonald et al. 2001). However, the data 
are too noisy to constrain T4 and 7 significantly. 



4.2.2 The best fit models 

In Fig. 3, the stars and squares symbols mark the mod- 
els which best fit the PS and PS+PDF data, respectively. 
These models are compared in Fig. 5 to the MOO data. The 
temperature of the scale-invariant models was chosen to be 
T4 = 1.5, while the RSI models have T4 = 2. This choice is 
motivated by the fact that, in a scale-invariant cosmology 
with erg = 0.9, models with T4<T.5 fit the data best whereas, 
in a RSI cosmology, the best fit are obtained for T4 ;>2 
(cf. Table 2). The RSI models match the observed A F (fc) 
somewhat better than the scale-invariant models. However, 
they tend to overestimate the PDF in the range F >0.6, 
which roughly traces regions of gas overdensity <5 g <;0. This 
follows from the relative lack of small-scale power, which 
translates into larger values of the PDF in the high trans- 
missivity tail. Consequently, matching the flux PDF in a 
RSI cosmology requires a low filtering length. Indeed, al- 
though the RSI models which best fit the PS data alone have 
fc F ^10 /iMpc" 1 , those which best fit the PS+PDF data 
have £;f <;11 ftMpc -1 , close to the largest value assumed by 
&f in our parameter grid. Since the Nyquist frequency of 
the grid is &N y ~ 10.2 /iMpc -1 , these models are certainly 
affected by numerical resolution. However, numerical reso- 
lution also contributes to the smoothing (e.g. Zaldarriaga, 
Hui & Tegmark 2001), and k-p = 11 /iMpc -1 is merely a 
lower limit. Note that we have kp ~ 6 — 9 /iMpc -1 for the 
best scale-invariants models, a filtering scale which is robust 
to resolution issues. 

Fig. 5 also clearly demonstrates that models which 
match best the PS data alone do not yield a good fit of 
the PDF. The constraints inferred from measurements of 
Ap(fc) alone might thus be significantly biased. It is there- 
fore important to combine the PS and PDF statistics to 
ensure that both are correctly reproduced in the simula- 
tion. To quantify the effect of adding the PDF, we plot in 
Fig. 6 the flux power spectrum and PDF of scale-invariant 
models for several values of T g . Models which fit best the 
PS and the PS+PDF data are shown in the top and bot- 
tom panels, respectively. When the PS data alone are fitted 
, the agreement between the observed and simulated flux 
PDF worsens with increasing temperature. For T4 J>2, the 
disagreement is very severe. A comparison between the top 
and bottom left panels reveals that including the PDF in the 
X 2 statistics increases the amplitude of flux power spectrum 
by an amount of ~10% on scale k <^0.01 skm -1 . Increas- 
ing the temperature produces a similar but weaker effect : 
the large-scale amplitude of A|(fc) decreases by about 5% 
when the temperature increases from T4 = 1 to 2. Note that 
the strength of this effect is consistent with findings from 
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Figure 4. Three scale-invariant models of normalisation erg = 0.9 which give an acceptable fit to the observed flux PS and PDF. The 
mean IGM temperature is Ta= 1.5. The reduced chi-squared are x 2 =23.5, 23.2 and 22.8 (for 23 degrees of freedom) for 7 = 1.5, 1.3 and 
1.1 respectively. Only the data points which lie between the two vertical lines were used to compute the \ 2 statistics. 




Figure 5. A comparison between models which best fit the PS data alone (dashed curves), and models which best fit the PS+PDF 
data (solid curves). They are shown for the erg = 0.9, scale-invariant cosmology (top panels), and for the RSI cosmology (bottom 
panels), for a fixed temperature T4 = 1.5 and 2 respectively. For the PS data only, the best fit values of the other parameters are 
(fc F , (F), 7)=(7.7, 0.71, 1.42) and (9.5,0.69,1) in the n s = 1 and RSI models respectively, whereas with the PS+PDF data, they arc 
(8.7,0.69,1.6) and (11.5,0.70,1). 



hydrodynamical simulations (e.g. Viel, Haehnelt & Springel 
2004). Combining the PS and PDF data could therefore re- 
duce the best fit value of as by about 10%. We also expect a 
degeneracy between the as and the mean IGM temperature. 
The data could be equally matched with a lower as and a 
larger IGM temperature. We will discuss this issue in more 
detail in Section §4.3. 

The photoionisation rate T_i2 of the neutral hydrogen 
(in unit of 10 _12 s _1 ) can be estimated from the optical 
depth normalisation To, which can be written as 



ro = 2.31^)f 4 - - 7 (i±^) 6 , (2) 

where A(z) = {Q. b h 2 / 0.02) 2 #i o(z) _1 r-i2(z) _1 . Here, Q b 
is the baryon content and H (z) the Hubble constant in unit 
of 100 kms^Mpc -1 (e.g. Rauch et al. 1997; McDonald et 
al. 2000). To calculate r_i 2 , we assume Q b h 2 = 0.02, in 
agreement with the constraint derived from CMB and deu- 
terium abundance measurements (e.g. Spergel et al. 2003; 
Buries & Tytler 1998). For the models of Fig. 5 which best 
fit the PS+PDF data, the optical depth normalisation is 



© 0000 RAS, MNRAS 000, 000-000 



8 Desjacques et al. 




Figure 6. Scale-invariant models of rms fluctuation erg = 0.9 which fit best the PS data solely (top panels), and the PS+PDF data 
(bottom panels) as a function of the mean IGM temperature. The temperature has a fixed T4 = 1 (solid), 1.5 (long-dashed), 2 (short- 
dashed) and 2.5 (dotted curves), while the other parameters were varied. The best fit parameter values obtained with the PS+PDF data 
are listed in Table 2. 



Table 2. Parameter values of scale-invariant and RSI models 
which best fit the PS+PDF data, for a mean IGM temperature 
T 4 = 1, 1.5, 2 and 2.5. The filtering k F is in unit of MVipc -1 . The 
last column gives the chi-squared for 23 degrees of freedom. Note 
that, since we spline interpolate over the parameters, the best fit 
values do not necessarily lie at a grid point. 



ro ~0.9 (scale-invariant) and ~0.8 (RSI). The inferred pho- 
toionisation rate for both best fit models is T_i2 ~ 0.6, 
a value consistent with other estimates (e.g. Rauch et al. 
1997; McDonald & Miralda-Escude 2001). Note, however, 
that this does not ensure that our simulations fully resolve 
the Lycv forest (cf. Section §3.2.2). 
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4.3 Sensitivity to the clustering amplitude 

As we discussed before, a scale-invariant model with a clus- 
tering amplitude of as = 0.9 matches best the data when the 
IGM temperature is low. Nonetheless, we could improve the 
agreement with the data in the range T4 J>1.5 by decreasing 
the rms density fluctuations. To assess the sensitivity of the 
statistics to as, we take advantage of the self-similarity of 
gravitational clustering in EdS cosmology (a good approxi- 
mation at redshift 2 ;>2). As a result, a snapshot at redshift 
different from 2 = 3, once rescaled, can mimic a cosmology 
with a different as- Since the fluctuation amplitude of the 
RSI model is tightly constrained by the CMB and Lya forest 
data (e.g. Bennett et al. 2003; Spergel et al. 2003), we will 
vary as in the scale-invariant cosmology only. We consider 
snapshots separated by a redshift interval Az = 0.2 in the 
range 2 = 2 — 4. These snapshots thus mimic scale-invariant 
cosmologies of rms density fluctuation 0.7 <.as <^1T- 

4-3.1 Correlation between as and T g 

In Fig. 7, we compare the PS (top panel) and the PDF (bot- 
tom panel) of several best fit models with different clustering 
amplitudes : as = 0.72 (solid), 0.82 (long-dashed), 0.9 (short 
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dashed) and 1 (dashed-dotted curves). We also show the best 
fit RSI model as a dotted curve. The mean IGM temperature 
was chosen to be T4 = 1.5. For sake of completeness however, 
we list in Table 2 the parameters values of best fit models 
for a mean IGM temperature in the range 1 < T g < 2.5. 
A comparison between the scale-invariant as = 0.82 model 
and the RSI which has a very similar as , demonstrates that 
the best fit power spectrum on scale k <J0.01 skm -1 is very 
sensitive to the small-scale behaviour of the matter power 
spectrum. Indeed, the difference in the flux power spectrum 
of these two models mainly results from including the PDF 
in the chi-squared statistics. This illustrates the advantage 
of combining several statistics of the Lya forest in constrain- 
ing the shape of the linear power spectrum. 

The scale-invariant models fit the flux power spectrum 
and the PDF reasonably well. However, the large-scale am- 
plitude of their flux PS increases with as, and causes the 
models with as >0.9 to overestimate the data on scale 
k <J0.01 skm -1 . To assess the significance of this effect, we 
need an estimate for the cosmic variance error in the flux 
PS. To this purpose, we ran two additional simulations (S3 
and S4) of a scale-invariant cosmology with normalisation 
as = 0.9 and computed the flux PS and PDF for the best 
fit models of Fig. 7. Then, from the sample of simulations 
SI — > SA, we computed the flux PS and PDF for the six pos- 
sible combinations of two simulations. The cosmic variance 
error was then the la scatter around the mean. For clar- 
ity, we use an offset and plot in Fig. 7 the cosmic variance 
as errorbars in the bottom of each panel. The errorbars are 
shown for the model with as = 0.9 only, as we found that 
they do not change much among the models. As we can 
see, they are small, about <;3% and <J2% in the flux PS 
and PDF respectively. However, since our simulations lack 
of large-scale power, they most probably underestimate the 
true cosmic variance. Yet the flux PS of the models with 
as ;>0.9 would still confidently lie above the data points 
on scale k <J0.01 skm -1 , even if the errors were twice as 
large. This demonstrates that, at fixed temperature, A|(fc) 
increases with as on scale k <J0.01 skm -1 . Consequently, 
there is a degeneracy between the temperature and cr 8 . A 
model with larger as requires a lower temperature to match 
the observed flux power spectrum. Note, however, that the 
measurement errors are large in that range of wavenumber. 
As a result, although the models with as ^0.9 shown in 
Fig. 7 predict a flux power spectrum which, "by eye" , over- 
estimates the observations, they are still consistent with the 
data in a \ 2 sense at least (The worst chi-squared, \ 2 = 25.4 
for 23 degrees of freedom, is obtained for as = 1, and corre- 
sponds to a fit probability ~30%). 

To illustrate the correlation between the mean IGM 
temperature and as, we plot in Fig. 8 the difference A\ 2 = 
X 2 ~ Xmin as a function of as, after marginalizing over the 
mean flux level (F). The other parameters do not vary in 
the chi-squared minimization. The mean IGM temperature 
is T g = 1 (left panel), 1.5 (middle panel) and 2 (right panel), 
the filtering is far =8.3 (solid curve), 7.1 (long-dashed) and 
6.3 /iMpc -1 (short-dashed), and the adiabatic index has a 
fixed value 7 = 1.3. These values of T g and 7 are consistent 
with that inferred from observations at z ~ 3 (e.g. Schaye 
et al. 2000; Ricotti, Gnedin & Shull 2000; McDonald et al. 
2001). The reason for selecting particular values of 7 and &f 
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Figure 7. A comparison between the flux PS (top panel) and 
PDF (bottom panel) of scale-invariant models with rms den- 
sity fluctuation cr 8 = 0.72 (solid), 0.82 (long-dashed), 0.9 (short 
dashed) and 1 (dashed-dotted curve). The RSI model is shown as 
dotted curves. The IGM temperature has a fixed value T4 = 1.5, 
while the other parameters are varied to obtain the best agree- 
ment with the data. The best fit models are listed in Table 2. In 
the bottom of each panel, the errorbars show our estimate of the 
cosmic variance for the model with erg = 0.9 (cf. text). 

follows from the fact that marginalizing over these param- 
eters proves difficult because of their degeneracy. The large 
measurement errors worsen the situation. However, we can 
take advantage of the fact that the subspace &f-7 has one 
effective degree of freedom. Namely, we can fix the adiabatic 
index 7 and select several values of the filtering length &f 
without restricting the analysis. Note that the wiggles which 
appear in A\ 2 are caused by numerical instabilities in the 
spline interpolation. Fig. 8 demonstrates that, at a given 
filtering kp, decreasing the temperature increases the nor- 



© 0000 RAS, MNRAS 000, 000-000 



10 Desjacques et al. 




Figure 8. Constraint on <rg from a fit to the observed flux PS and PDF after marginalization over the the mean transmitted flux. Only 
the parameters as and (F) were varied. The curves show A\ 2 = X 2 ~ X^in ^ or a mean IGM temperature T4 = 1 (left panel), 1.5 (middle 
panel) and 2 (right panel). The filtering is kp = 8.3 (solid curve), 7.1 (long-dashed) and 6.3 /iMpc -1 (short-dashed). The adiabatic index 
has a fixed value 7 = 1.3. 



malisation a 8 of the best fit models ( A\ 2 = 0) . Moreover, at 
fixed temperature, decreasing the filtering &f also increases 
the best fit cr 8 . For e.g. f 4 = 1.5, we find <7 8 « 0.71, 0.83 and 
0.96 for fcF=8.3, 7.1 and 6.3 /iMpc -1 . In other words, one has 
to increase the clustering amplitude in order to compensate 
for the larger smoothing. It would therefore be possible to 
match the data with a normalisation as J>1 if both the filter- 
ing and the IGM temperature were very low, fcp ~ 6 /iMpc -1 
and T4 w 1. Indeed, a model where &f = 6.3 /iMpc -1 and 
T 4 = 1 has a best fit value as — 1.06, and fits the data 
with an acceptable chi-squared \ 2 = 22.4 for 24 degrees of 
freedom. 



4-3.2 A temperature- dependent filtering 

As we discussed in the previous Section, a model with a 8 ;>1 
can match the data if both T g and &f are very low. In reality, 
however, these two parameters are not independent since the 
filtering &f to increase with the temperature (when all the 
other parameters are fixed). To make further progress, we 
will assume that the filtering k-p is solely a function of the 
mean IGM temperature T s . In Appendix §A, the dependence 
of &f on the IGM temperature and the reionization history 
of the Universe is discussed in detail. In brief, we expect the 
filtering scale to decrease with increasing temperature as 
&f oc T^ 1 / 2 , and to be smaller than ~ 14 /iMpc -1 at z = 3 
for reasonable reionization scenarios (cf. Appendix §A). 

To assess the effect of such an assumption on the con- 
straint inferred from the flux PS and PDF, we plot in the 
left panel of Fig. 9 A\ 2 as a function of as after marginal- 
izing over the mean flux. Only as and (F) were varied to 
obtain the flux PS and PDF of the best fit models shown 
in the right panel. We assumed a temperature-dependent 
filtering scaling as kp = 102V 1 / 2 (curves with triangles) 
and 8T4 -1 / 2 /iMpc -1 (curves with squares). The results are 
shown for different values of the temperature, T 4 = 1 (solid 
curves), 1.5 (long-dashed) and 2 (short-dashed), but for a 
fixed value of the adiabatic index, 7 = 1.3. The param- 



eter values of the best fit models are listed in the upper 
half of Table 3. The temperature-dependent filtering causes 
the best fit value of as to increase with the temperature. 
The rms fluctuations even reaches 0.9 — 0.95 for a filtering 
kp = 102V 1 / 2 and a temperature T4 <jl.5. Most impor- 
tantly, the amount of filtering significantly matters, as the 
best fit parameters are rather sensitive to the normalisa- 
tion of the relation fcF(TV). At fixed temperature, a scaling 
kp = 82V 1 / 2 leads to best fit values of as which are 10- 
20% larger than a scaling kp = 102V 1 / 2 . This strongly sug- 
gests that the Gaussian filter approximation (e.g. Gnedin 
et al. 2003) is certainly not accurate enough to constrain 
the clustering amplitude to better than ~10%. A close look 
at the right panel of Fig. 9 also reveals that the models 
with fcp(l) = 10 /iMpc -1 T look consistent with the data, 
whereas those with kp(l) = 8 /iMpc -1 tend to overestimate 
Ap(fc) on scale k <C0.01 skm -1 , especially when the temper- 
ature is large, 24^1.5. The best fit values of the clustering 
amplitude inferred from a temperature-dependent filtering 
&f < WTg 1 ^ 2 are a ^0.7. However, since the Nyquist fre- 
quency of the grid is &N y ~ 10.2 /iMpc -1 , we cannot robustly 
probe the range of wavenumber fcp(l) ^10 /iMpc -1 . Conse- 
quently, our analysis cannot exclude values of as as low as 
as < 0.7. However, since fep(l) = 14 /iMpc -1 is probably 
an unrealistically high value for the filtering wavenumber at 
redshift 2 = 3 (Appendix §A), we expect 0.7 <,as <J0.9 from 
the present results for a reasonable choice of reionization 
history, and an IGM temperature in the range 1 <;T4 <2. 

4-3.3 Effect of including the full PDF error matrix 

As yet we have set to zero all the off-diagonal elements of 
the error matrix. We now present results with the full er- 
ror covariance matrix as given in McDonald et al. (2001). 
A comparison between the results in the two cases allows 

t fc P (l) stands for k F (f 4 = 1). 
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Figure 9. Left panel : A% 2 after marginalizing over the mean flux level. Only us and (F) were varied to obtain the best fit parameters. The 
mean IGM temperature T4 = 1 (solid curves), 1.5 (long-dashed) and 2 (short-dashed). The filtering scales with T^as fcp = 8T4 -1 / 2 (curves 
with squares) and 10 TT -1 / 2 ftMpc -1 (curves with triangles). The adiabatic index has a fixed value 7 = 1.3. The spline interpolation 
was performed using the diagonal elements of the PDF covariance matrix solely. Right panel : The flux power spectrum and PDF of the 
best fit models (A\ 2 = 0). The IGM temperature is T4 = 1 (solid curves), 1.5 (long-dashed) and 2 (short-dashed), for a fixed adiabatic 
index 7 = 1.3. The best fit values of as, (F) together with the chi-squared are listed in the upper half of Table 3. 



an assessment of the robustness of the inferred best fit pa- 
rameters. Since we are interested in the measurements with 
0.05 < F < 0.75, we obtain the desired covariance matrix 
by setting the errors of the data points outside that range 
to infinity. Curves of A\ 2 versus <j$ are plotted in the left 
of Fig. 10. In the panel to the right of the same figure flux 
power spectra and PDFs corresponding to the best fit pa- 
rameters are shown. Values of the best fit parameters are 
listed in the lower half of Table 3. Taking into account the 
covariance matrix of the flux PDF in the chi-squared mini- 
mization introduces a systematic shift in the central values 
of the best fit parameters. Yet the effect is rather small. The 
clustering amplitude, ag, gets lower by only a few percents 
in most cases. The wiggles in the chi-squared curves in the 
left panel of Fig. 10 lead to a highly non-gaussian distribu- 
tion for the statistical errors on the best fit parameters. As 
a result, assessing to which extent the errors on the best 
fit values of as compare with those inferred from the diag- 
onal elements only proves difficult. However, we found that 
at fixed as and T g , including the off-diagonal elements in- 
creases the confidence levels in the plane - 7 by 5-10% 
(cf. Section §4.2.1). Finally, a comparison between the right 
panels of Fig. 9 and 10 reveals that, when the full covariance 
matrix is included, the flux PDF of the best fit models falls 
systematically below the data points in the range <;F <J0.5. 
This kind of behaviour is likely to occur when the errors are 
strongly correlated, as it is the case for the flux PDF. The 
curve for which the likelihood is the largest is not necessarily 
the one which goes through the error bars attached to the 
data points. 



4.4 The observed line statistics 

To assess whether the best fit models reproduce the observed 
line statistics, we compute the column density and line width 
distributions, and compare our results to observational data. 

4-4-1 The column density distribution 

The differential line density distribution f(N HI ) is defined as 
the number of lines per unit column density and per unit ab- 
sorption distance (see e.g. Tytler 1987; Schaye 2001), which 
is dX/dz £s f2^ 1//2 (l + z) 1 / 2 at redshift z ~ 3 in the stan- 
dard ACDM model. However, to facilitate the comparison 
with the measurements of Petitjean et al. (1993) and Hu et 
al. (1995), who assumed a deceleration parameter qo = 0, wc 
take the absorption distance to be dX/dz = (1 + 2). In the 
left panel of Fig. 11 we plot the column density distribution 
for several models which give a good fit of the PS and PDF 
data (xV^ <;!)■ /(-^hi) was calculated from a sample of 40 
LOS. The simulated curves are compared to the data of Hu 
et al. (1995) and Petitjean et al. (1993), which are shown as 
filled circles and squares respectively. The mean redshift of 
both samples is z = 2.8, close to that of the MOO data con- 
sidered in the present paper. We plot f(N m ) for two scale- 
invariant, as — 0.9 models with different values of far and 7 : 
(&f,7) = (8.3,1.5) (solid curve) and (6.3,1.1) (solid-dotted 
curve). In addition, we show f(N HI ) for a cold (T4 = 1.5) 
and a hot (T4 = 2.5) scale-invariant model of normalisation 
as = 0.72 as long- and short-dashed curves respectively. 
The RSI model is shown as a dotted curve. Absorption lines 
with column density 10 12 5 < N HI < 10 15 B cm~ 2 contribute 
most to the Lya forest. In that range, the simulated col- 
umn density distributions agree very well with the data. In 
particular, the slope and the normalisation of the simulated 
distributions coincides with that of the observations. The 
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later also tends to steepen in the range N HI JjlO cm~ . 
Such a deviation from a single power law is also found in 
the observations (e.g. Petitjean et al. 1993; Kim et al. 1997). 
Fig. 11 demonstrates that, although the flux PS and PDF 
of the best fit models is rather sensitive to the normalisa- 
tion amplitude and the relative amount of small-scale power 
(cf. Section §4.3), the column density distribution barely 
changes among the various models, in agreement with the 
results of Theuns, Schaye & Haehnelt (2000) who found that 
f(N HI ) is insensitive to the amount of small scale power. If 
most of the lines identified by VPFIT have a column density 
N m £10 15 ' 5 cm" 2 (~ 10 3 for a sample of 40 LOS), absorp- 
tion systems with column density exceeding N m J>10 15 ' 5 are 
rare ( £30). It is however crucial to reproduce them since 
they can contribute up to 50% of the flux power spectrum on 
scale k £0.01 skm -1 (e.g. Viel et al. 2004b). Although the 
column density distribution suffers from the shot noise as- 
sociated to these rare events, the left panel of Fig. 11 shows 
clearly that our simulations reproduce, at least qualitatively, 
these strong absorption systems. 

4-4-2 The line-width distribution 

The line- width distribution, /(&), which is the fraction of line 
with a given width, is shown in the right panel of Fig. 11 for 
the models plotted in the left panel. f(b) is computed from 
absorption systems with column density 10 12 ' 5 — 10 14 ' 5 cm~ 2 . 
The solid histogram is the data of Hu et al. (1995). The ob- 
served line-width distribution clearly exhibits a peak in the 
range b ~ 20 — 40 kms -1 . In this respect, Fig. 11 shows that 
in the simulations, the peak is more pronounced for a model 
with larger value of 7 at constant as- Furthermore, the scale- 
invariant model with lower normalisation (as = 0.72) and 
the RSI model account somewhat better for the amplitude 
of the observed peak, f(b) ~ 0.04. However, given the large 
measurement errors, using f(b) to constrain the parame- 
ter of the model does not seem feasible. Most importantly, 
it has been demonstrated that very high-resolution hydro- 



simulations are needed to reliably predict f(b) (Theuns et al. 
1998; Bryan et al. 1999; Theuns, Schaye & Haehnelt 2000). 
Consequently, it is difficult to draw any firm conclusion on 
the sensitivity of the line-width distribution to the shape 
and amplitude of the matter power spectrum of the best fit 
models. 



4.5 Systematic errors 

Various sources of error affect our analysis, among them sys- 
tematics in the data and in the modelling of the Lya forest. 
Regarding the data, note that the MOO measurements, ob- 
tained from a sample of 8 QSO only, are quite noisy. In 
the flux power spectrum for example, the data point at 
k — 0.00566 skm -1 lies well above the others. One might 
be worried that our results are strongly affected by this sin- 
gle data point. However, the corresponding error, ~20%, is 
larger than that of the other data points. For the best fit 
model with normalisation as = 0.9 shown in Fig. 7, the con- 
tribution of this data point to the chi-squared of the PS mea- 
surements (x 2 = 10.4) is Ax 2 = 2.86, smaller than that of 
the data point at k = 0.00713 skm -1 , for which A\ 2 = 4.45. 
We found indeed that the removal of this data point from 
the chi-squared does not affect the results noticeably. 

Continuum fitting errors are likely to affect the flux 
probability distribution. The modelling of these errors is 
complicated by the fact that the scales of interest are of 
the order of the box size L of the simulations. However, 
MOO were able to demonstrate that, if the inclusion of con- 
tinuum fitting errors can account for most of the discrep- 
ancy between the simulated and observed PDF in the range 
F £0.8, it should not affect much the PDF for F £0.8. We 
thus believe that the PDF in the range F £0.8 is robust to 
these errors. Continuum fitting errors might also affect the 
inferred column densities N m . Notwithstanding, the devia- 
tion from a single power-law in the regime N Hl £10 14 cm~ 2 
is most probably caused by line blending, and by the nonlin- 
ear evolution of the structures associated to the absorption 
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Figure 11. Left panel : The differential column density distribution /(JV HI ) of various models which fit the PS and PDF data best : 
two scale-invariant models of normalisation erg = 0.9 (solid and dottcd-dashed curve), a scale-invariant model with ag = 0.72 (long- 
dashed curve), and a RSI model (short-dashed curve). The parameter values of these models are (fcp, (F), 7, T,j)=(8.3,0.69,1.5,1.5), 
(6.3,0.69,1.1,1.5), (10,0.7,1.5,1.5) and (12.5,0.7,1.1,2) respectively. Note that the scale-invariant models have a temperature T4 = 1.5, 
while the RSI model has T4 = 2. The filled circles and squares are the data of Hu et al. (1995) and Petitjcan et al. (1993) respectively. 
Right panel : the corresponding line-width distribution /(&). Only the lines with column density N m = 10 12 ' 5 — 10 14 ' 5 cm~ 2 were taken 
into account. The simulated distributions are compared to the data of Hu et al. (1995) shown as an histogram. 



systems (e.g. Kim et al. 2002; see also Schaye 2001). Finally, 
continuum fitting errors should play a minor role in the flux 
power spectrum as the scales probed by our simulations are 
relatively small, k ^0.003 skm" 1 (e.g. Kim et al. 2004). 

Up to now we have adopted the prior (F) = 0.684 ± 
0.023 (MOO). However, one should bear in mind that the 
presence of metal lines or strong absorption systems can 
substantially affect the determination of (F) (e.g. Schaye et 
al. 2003; Viel et al. 2004b). In particular, Schaye et al. (2003) 
found that accounting for strong Hi systems and metal lines 
yields a lower value of (F) = 0.637 at z = 3. Furthermore, 
as shown by Seljak, McDonald & Makarov (2003) and Viel, 
Haehnelt &: Springel (2004), a wrong estimate of the mean 
flux level can severely bias the results. It is therefore pru- 
dent to examine to which extent the assumed mean flux 
level affects the results. For this purpose, we have repeated 
the analysis of Section §4.3.2 with a mean flux (F) — 0.637 
(Schaye et al. 2003), and a standard deviation ap — 0.030. 
Only the diagonal elements of the PDF error matrix were 
considered in the chi-squared minimization. Furthermore, 
Ax 2 was calculated from the PS data alone, and from the 
whole PS+PDF data set. We found that, in both cases, low- 
ering the mean flux level amounts to a larger best fit value of 
as, in agreement with previous studies (e.g. Viel, Haehnelt 
& Springel 2004). However, the effect is much weaker when 
Ax 2 is computed from the whole PS+PDF data set ( <;3%) 
than from the PS data alone ( <^8%). Combining the flux 
PS measurements with other statistics of the Lya forest can 
thus significantly reduce the sensitivity of the results to the 
mean transmitted flux. 

Our semi-analytical model of the Lya forest allows us 
to explore a much larger parameter space than full hydrody- 



namical simulations. However, the model has several short- 
comings. It neglects any possible scatter in the temperature- 
density relation of the low density IGM as a results of shocks 
and patch helium reionization (e.g. Gleser et al. 2005). It 
also assumes a filtering length that is independent of the lo- 
cal gas density and temperature. It also ignores any galactic 
feedback (e.g. Croft et al. 2002a; Kollmeier et al. 2003). Hy- 
drodynamical simulations predict that shock heating should 
drive a significant fraction of the baryons into the warm-hot 
phase of the intergalactic medium (WHIM) at low redshift. 
At the present epoch, this fraction might be as large as 40% 
(e.g. Cen & Ostriker 1999; Dave et al. 2001; see also Nath 
& Silk 2001). At redshift z ~ 3, however, this fraction falls 
below 10%. Moreover, numerical simulations also show that 
most of the WHIM baryons at that redshift resides in over- 
densities <5 g :>10 (Dave et al. 2001). Hence, shock heating 
should have a rather weak impact on the low density IGM 
at z w 3. Inhomogeneities in the UV background might also 
affect the flux power spectrum. Yet this effect should not 
be too important on the scales (k ^0.05 skm~ ) and red- 
shifts (z = 3) of interest (e.g. Croft 2004; McDonald et al. 
2004c). Recent measurements of the Lya absorption near 
Lyman-break galaxies (Adelberger et al. 2003) are taken 
as evidence for the existence of dilute and highly ionised 
gas bubbles caused by supernovae-driven winds. Notwith- 
standing, simulations indicate that their small filling factor 
results in a moderate impact on statistics of the Lya for- 
est such as the power spectrum or the PDF (e.g. Croft et 
al. 2002a; Weinberg et al. 2003; Desjacques et al. 2004; Mc- 
Donald et al. 2004c). Gleser et al. (2005) have demonstrated 
that patchy helium II reionization can cause significant scat- 
ter in the temperature density relation. A detailed account 
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Table 3. Parameter values of the models which best fit the 

flux PS and PDF, assuming a temperature-dependent smooth- 

~ 1/2 

ing fcpTg =const. The values listed in the upper half of the 
table were obtained using only the diagonal terms of the PDF 
error matrix whereas, in the lower half, they were obtained us- 
ing the full covariance matrix of the flux PDF. The filtering scale 
kp(l) is given in unit of /iMpc - 1 . The adiabatic index has a fixed 
value 7 = 1.3, while the mean IGM temperature is T4 = 1, 1.5 
and 2. The last columns gives the best fit values of (F), erg and 
the corresponding x 2 (for 24 degrees of freedom). 



filtering 


7 


f 4 


(F) 


erg 


x 2 


fe F (l) = 10 


1.3 


1 


0.68 


0.67 


24.9 






1.5 


0.69 


0.72 


21.7 






2 


0.69 


0.76 


25.3 


fc F (l) = 8 


1.3 


1 


0.68 


0.84 


22.4 






1.5 


0.68 


0.94 


25.4 






2 


0.68 


0.93 


38.7 


fc F (l) = 10 


1.3 


1 


0.70 


0.70 


26.9 






1.5 


0.71 


0.72 


23.9 






2 


0.71 


0.72 


27.7 


fcp(l) = 8 


1.3 


1 


0.71 


0.83 


23.8 






1.5 


0.71 


0.82 


27.3 






2 


0.71 


0.88 


38.1 



of the effect of this scatter on the estimation of cosmological 
parameters has not yet been taking into account neither in 
semi-analytic modeling nor in hydrodynamical simulations. 
A constant filtering length also lead to systematic errors 
in the inferred values of as (e.g. Viel, Haehnelt & Springel 
2004). In any case all of these effects are not expected to 
amount to more than an error of 20% in the estimated best 
fit parameters. 



5 DISCUSSION 

We have used N-body simulations of variants of the ACDM 
models to generate synthetic spectra of the Lya forest. We 
have simulated the standard ACDM cosmology with scale- 
invariant spectral index models, as well as a running spectral 
index model (Spergel et al. 2003). The one- dimensional flux 
power spectrum (PS) and flux probability distribution func- 
tion (PDF) have been computed from mock catalogs, and 
compared to the observational data in order to constrain 
the cosmological models and the physical parameters that 
dictate the properties of the Lya forest. In addition to the 
one- and two-point correlations, we have also computed the 
neutral hydrogen column density distribution and the line 
width distribution. These last two statistics have only been 
used as a consistency check for the models which give the 
best fit to the observed PS and PDF. 

The RSI model matches the PS somewhat better than 
our scale-invariant model with rms fluctuation ag, = 0.9, but 
overestimates the PDF in the range F >0.6 unless the tem- 
perature is very high, T4 ;>2. A scale-invariant model with 
(jg = 0.9 matches better the flux PDF but tends to overesti- 
mate Ap(fc) on scale k jjO.Ol skm" 1 when the temperature 

is large, T4^1.5. However, the agreement with the observed 
flux power spectrum can be improved by lowering the value 




Figure 12. A comparison between the A% 2 versus eg computed 
from the PS alone (curves with squares), and from the PS+PDF 
data (curves with triangles). A\ 2 was obtained after varying <rg 
and (F), and marginalizing over the value of the mean flux. The 
mean IGM temperature is T 4 = 1 (solid curves), 1.5 (long-dashed) 
and 2 (short-dashed). The filtering is fcp = 10 /iMpc^f^ 1 / 2 (cf. 
Section §4.3.2), and the adiabatic index has a fixed value 7 = 
1.3. The spline interpolation was performed using the diagonal 
elements of the PDF covariance matrix solely. 



of the normalisation amplitude, erg. Most importantly, mod- 
els which match best the PS data alone usually do not yield a 
good fit to the PDF. The discrepancy worsens with decreas- 
ing temperature. In the case of the ag = 0.9, scale-invariant 
cosmology, the disagreement with the observed PDF is par- 
ticularly severe for T4 J>2. We also computed the line statis- 
tics for some of our best fit models. We found that they all 
reproduce successfully the slope and normalisation of the 
column density distribution f(N HI ), as well as the overall 
line- width distribution f(b). The simulated line- width dis- 
tribution is probably affected by resolution effects. Yet the 
constraints on the primordial power spectrum are insensitive 
to the details of the line- width statistic. Degeneracies among 
the parameters diminish the ability of the data to constrain 
the shape and normalisation of the primordial power spec- 
trum (e.g. Zaldarriaga, Hui & Tegmark 2001; Zaldarriaga, 
Scoccimarro & Hui 2003). In this respect, we have found 
that the pairs 7 - fcp and T g - ag are degenerate when we 
marginalize over the mean transmitted flux. To proceed fur- 
ther, we have assumed that the filtering wavenumber fcp is 
related to the IGM temperature according to fc F oc T^ 1 ^ 2 . 
The reasons which motivate this choice are discussed in Ap- 
pendix §A. Under that assumption, a chi-squared minimiza- 
tion indicates that the normalisation amplitude is likely to 
be 0.7 <og <0.9 for a reasonable reionization scenario and 

temperatures 1 <JT4 <;2. For T4 ;>1.5, the data favour mod- 
els with normalisation as <C0.8 rather than models with 
ag J>0.8, the later slightly overestimating Ap(fc) in the range 
k <J0.01 skm -1 . It should also be noted that taking into ac- 
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count the full error matrix of the flux PDF systematically 
lowers the best fit values of as by a few percents. 

Our results suggest that the constraints inferred from 
measurements of the flux power spectrum alone might be 
biased. Most previous estimates of as from Lycv absorption 
measurement relied on the flux PS alone (e.g. Croft et al. 
1999, 2002; McDonald et al. 2000, 2004b; Viel, Haehnelt & 
Springel 2004; see also Viel, Weller & Haehnelt 2004; Sel- 
jak et al. 2004. In light of our results however, it is unclear 
whether their best fit models also match the PDF of the flux 
(and other statistics of the Lya forest as well). To emphasize 
the importance of combining several statistics we compare 
in Fig. 12 the constraints on as obtained from the PS data 
alone (curves with square) to those from from the whole 
PS+PDF data set (curves with triangles). We assumed a 
filtering wavenumber of fc F = lOT^ 1 / 2 /iMpc -1 . The curves 
A\ 2 were calculated for a mean IGM temperature T4 = 1 
(solid curves), 1.5 (long-dashed) and 2 (short-dashed), but 
for a fixed adiabatic index 7 = 1.3. Note that the best fit 
values of as in the case of the PS data only are consistent 
with those inferred in previous works. Fig. 12 clearly demon- 
strates that the inclusion of the PDF in the minimization 
has a large impact on the normalisation amplitude, decreas- 
ing the best fit value of as by about 20%. Consequently, 
the analysis of Viel, Haehnelt & Springel (2004) and Mc- 
Donald et al. (2004b) may overestimate the normalisation 
amplitude. Furthermore, including the PDF significantly im- 
proves the confidence on erg as we can see from the shape of 
Ax 2 - Finally, as we discussed in Section §4.5, it also notice- 
ably reduces the sensitivity of the results to the mean flux 
level (F). A joint fit of the flux power spectrum and PDF 
can thus greatly improve over the standard analysis based 
on the flux power spectrum alone. 

Although including the flux PDF tends to lower the 
best fit value of its, overall our result of us = 0.85 — 0.95 
is consistent with the studies of Viel, Haehnelt & Springel 
(2004), and McDonald et al. (2004b). Both of these stud- 
ies were based on flux measurements at several redshifts 
(McDonald et al. 2004b use eleven redshift bins spanning 
the range 2.2 < z < 4.2). They also relied on a more de- 
tailed treatment of the gas physics by means of hydrody- 
namical simulations. Our assumption of uniform smoothing 
does not account for the dependence of the smoothing on 
the local IGM density and temperature. Yet the exact rela- 
tion between the gas and dark matter depends on the rather 
poorly constrained thermal and reionization history of the 
universe. Simulations incorporating a wider variety of reion- 
ization histories are needed for a better understanding of the 
systematics in inferring the cosmological parameters from 
the forest. It is also unclear whether the currently available 
hydrodynamical simulations give reasonable flux PDFs. 

We have demonstrated that a joint analysis of the forest 
can potentially yield more robust estimates of the cosmo- 
logical parameters than those inferred from the flux power 
spectrum alone. Here we have included mainly the PDF and 
the flux power spectrum. But other statistical measures can 
also be used. The flux bispectrum (e.g. Mandelbaum et al. 
2003) offers a promising possibility as it should be less sen- 
sitive to systematics than the PDF. 
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APPENDIX A: THE FILTERING LENGTH 

Gas pressure smoothes the gas distribution relative to that 
of the dark matter. This effect becomes important below a 
(comoving) scale xj — 1/fcj, the Jeans scale, which is defined 
as (e.g. Bi, Borner & Chu 1992) 



xj 




2jk B T e 



0.176 h 



'Mpc ( 



7?4 



1/2 



(Al) 



where a is the scale factor, H the Hubble constant, fcg 
the Boltzmann constant and p the mean molecular weight. 
Here, 7 describes the temperature-density relation, T g = 
T g (l + <5 g ) 7_1 The numerical estimate was obtained for a 
ACDM cosmology with matter content fl m = 0.3, assuming 
p = 0.59, a value appropriate for a fully ionized plasma of 
primordial abundance. We should emphasize that eq. (Al), 
which defines the Jeans scale xj(t) in term of physical quan- 
tities evaluated at mean density, merely defines a character- 
istic scale. The filtering scale xf(tc, t) over which the IGM is 
smoothed will generally differ from the Jeans scale. For the 
moment, let us consider the filtering length at mean density, 
1/&F, which is also a function of time alone. The redshift evo- 
lution of for is complex, and depends on the details of the 
reionization scenario, i.e. on the whole time evolution of fcj. 
However, for some particular choices of reionization history 
(e.g. sudden reionization) , it is possible to work out analytic 
solutions to the linear equation governing the evolution of 
baryonic and dark matter in a EdS Universe (e.g. Peebles 
1984; Bi, Borner & Chu 1992; Gnedin & Hui 1998; Nusser 
2000; Matarrese & Mohayaee 2002). Under these assump- 
tions, Gnedin & Hui (1998) pointed out that, at redshift 
z = 3, the filtering length (at mean density) is fep = r)kj, 
with r\ ~ 1.5 — 2.5 for realistic reionization scenarios. These 
values should nonetheless not be taken too seriously given 
the uncertainties in the relation between baryons and dark 
matter. Taking 7 = 1.3 and T±=\ yields &f w 7n /iMpc -1 . 
Note that we divided &f by \/2 to account for our defi- 
nition of the filter, W = exp(— k 2 /2k F ), which reduces to 
~ 1 — k 2 /2k F in the limit of large wavenumber. 

In our analysis, we smooth the Fourier modes of the 
dark matter density field with a uniform Gaussian filter to 
obtain the gas density and velocity fields. In reality, we ex- 
pect the filtering length xp(x,t) to be a function of space 
and time through the local temperature and density. To get 
some idea of this dependence, let us consider a spherical 
perturbation of physical radius r = xf/{1 + z). On the one 
hand, the excess pressure will try to smooth out the per- 
turbation on a timescale r/cs, where cs is the local speed 
of sound. One the other hand, the enhanced density gives 
rise to an extra inward gravitational force (mainly due to 
dark matter) which tends to increase the matter content of 
the perturbation on a timescale Gpm, where p* is the 
dark matter density smoothed with a top-hat window of co- 
moving width xf. Equating these two timescales allows us 
to express xf as 

' ' =(! + *). (A2) 



xf = 



The density p* is not equal to the gas density. However, 
both are tightly related and have similar dependence on xf- 
Hence, if we assume that oc p g , and that the gas tem- 
perature and density follow the power-law relation discussed 
above, we have 

xf(z, Sg) = xf(z) (1 + 5 g ) 7/2_1 , (A3) 

where xf(z) is the filtering length at mean density. For an 
adiabatic index 1 <7 <1.6, eq. (A3) implies that i F depends 
weakly on the gas density contrast S g . Notwithstanding, a 
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uniform smoothing will lead to an overestimation (underesti- 
mation) of the gas density in the low (high) density regions 
of the simulation. We can however adjust the strength of 
the fc-space smoothing W, i.e. the value of fcF, such that 
the smoothed dark matter power spectrum is as close as 
possible to the true gas power spectrum. This is indeed the 
best we can do since it is practically impossible to mimic a 
non-uniform smoothing in real space with a fc-space kernel. 

To determine the optimal fcF, we assume that the gas 
density can be obtained locally from the dark matter density 
as follows : 



l + 5 g (x) 



-h 



d s y 



[i + My)]e 



2x f 



(A4) 



(2tt) 3 / 2 4 

In principle, the filtering length xf — xf(S s ) should de- 
pend on the local gas density, and eq. (A4) would implicitly 
define S g . For simplification however, we will assume that 
if = XF(S m ) = xf(1 + <5 m ) 7 ^ 2 ~ 1 (equation A3). The Fourier 
transform of the gas density field is simply given by 



* g (k)+&(k) 



~ J V< 



[l + <5 m (x)]e" 



(A5) 



where fo(k) is the Dirac delta function. The gas power spec- 
trum is then (ignoring the term at k = 0) 



^g(k) 



<(1 + S m ) (1 + C) e -| fc2 (4+^ 2 ) )e -*r (A6) 

where the fields <5 m , xf and 5'^, evaluated at position 

x and x + r respectively. P g (k) is thus the Fourier transform 
of a mass-weighted filter, 

Z s [k, r] = ((1 + S m ) (1 + C) e-^ 2 (4+4 2 )) . (A7) 

To estimate fcF, we consider the limit of small wavenumbers 
fc — > 0. Since the integrand is weighted by r 2 , we expect 
the main contribution to the integral to arise in the limit 
1 1- 1 — ► co. Expanding the previous equation to first order in 
fc and taking the limit |rj — > co, we have 



P(Sm)d5 n 
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with v — ln(l + Sm) + 0l/2, and where ox is the linear rms 
of dark matter fluctuations (Coles & Jones 1991). Although 
this approximation is inaccurate when ox ^1, it should be 
adequate enough to assess whether fcF is larger or smaller 
than fcF. The calculation yields 



1 



2 \(7"2)(3- 7 )/2 
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where we have used 1 + ct^l = exp(cr 2 ). Note that, since xf 
depends on the local gas density through relation (A3), we 
should expect ctnl to be a smoothed version of the dark mat- 
ter rms fluctuation amplitude, i.e. a function of fcF - Nonethe- 
less, equation (A12) clearly shows that, for an adiabatic in- 
dex 1 <J7 <:1.6, the mass- weighted filtering wavenumber fcF 
is smaller than that at mean density. This follows from the 
fact that most of the mass resides in moderately high-density 
regions where the temperature T g is significantly larger than 
T g (e.g. Bi & Davidsen 1997). Consequently, given that i) 
/cf oc fcj and ii) /of < far, we can place an upper limit on 
the filtering fcF- For the particular choice 7 = 1.3, one has 
fc F <7rjf 4 ^ 1/2 /iMpc -1 . Bearing in mind that the IGM tem- 
perature is most probably larger than 10 4 K at z — 3 (e.g. 
Schaye et al. 2000), we believe that A;f <J14 /iMpc -1 should 
be a reliable lower limit on the amount of filtering at 2 = 3 
for a reasonable history of the Universe. 



Z g [k, 



(1 + £ m (r)) - T ((l + S m ) (1 + C) (4 + Zf 2 )> 



= (l + €m(r)) 



1 - k 



2 {(l+8m)(l+S' m )x F ) 



l + £ m (r) 
(1 + Cm(r)) [l-fc 2 ((l + «5 m )4}] , 



(A8) 



since the cross-terms (S'^xp) and (SmXpS^), and the two- 
point correlation £ m tend to zero for large separation. Hence, 
in the limit fc — > 0, the gas power spectrum can be expressed 
as 



^g(k) 



hp 



Pm(k) 



(A9) 



where fcF is an effective filtering length which differs from 
the filtering length k F — l/x F at mean density, 



kl 



{{l + 5 m )xl) 



1 



dS m V(8 m ) (1 + S m ) 



7-2 



(AlO) 



Here, V(5 m ) is the one-point probability distribution of the 
dark matter density field. Eq. (A10) shows that, in the linear 
regime, fcF is obtained from a mass- weighted average of the 
local filtering length xf ■ To evaluate fcF , we will assume that 
the dark matter PDF is well approximated by a lognormal 
distribution, 
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